LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

نویسندگان

  • Michael Brudno
  • Chuong B Do
  • Gregory M Cooper
  • Michael F Kim
  • Eugene Davydov
  • Eric D Green
  • Arend Sidow
  • Serafim Batzoglou
چکیده

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an approximately 1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http://lagan.stanford.edu.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Glocal alignment: finding rearrangements during alignment

MOTIVATION To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, wher...

متن کامل

Phylo-VISTA: interactive visualization of multiple DNA sequence alignments

MOTIVATION The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate cons...

متن کامل

Evolution of cis-regulatory sequences in Drosophila: a systematic approach

Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to...

متن کامل

Whole genome alignments using MPI-LAGAN

Advances in sequencing technologies have substantially increased the number of fully sequenced genomes. Alignment algorithms play a crucial rule in analyzing whole genomes, identifying similar and conserved regions between pairs of genomes, leading to annotation of genomes with site-specific properties and functions. In this work we introduce a parallel algorithm for a widely used whole genome ...

متن کامل

Accurate anchoring alignment of divergent sequences

MOTIVATION Obtaining high quality alignments of divergent homologous sequences for cross-species sequence comparison remains a challenge. RESULTS We propose a novel pairwise sequence alignment algorithm, ACANA (ACcurate ANchoring Alignment), for aligning biological sequences at both local and global levels. Like many fast heuristic methods, ACANA uses an anchoring strategy. However, unlike ot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 13 4  شماره 

صفحات  -

تاریخ انتشار 2003